102 research outputs found

    Comparison of phylogenetic trees through alignment of embedded evolutionary distances

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The understanding of evolutionary relationships is a fundamental aspect of modern biology, with the phylogenetic tree being a primary tool for describing these associations. However, comparison of trees for the purpose of assessing similarity and the quantification of various biological processes remains a significant challenge.</p> <p>Results</p> <p>We describe a novel approach for the comparison of phylogenetic distance information based on the alignment of representative high-dimensional embeddings (xCEED: Comparison of Embedded Evolutionary Distances). The xCEED methodology, which utilizes multidimensional scaling and Procrustes-related superimposition approaches, provides the ability to measure the global similarity between trees as well as incongruities between them. We demonstrate the application of this approach to the prediction of coevolving protein interactions and demonstrate its improved performance over the mirrortree, tol-mirrortree, phylogenetic vector projection, and partial correlation approaches. Furthermore, we show its applicability to both the detection of horizontal gene transfer events as well as its potential use in the prediction of interaction specificity between a pair of multigene families.</p> <p>Conclusions</p> <p>These approaches provide additional tools for the study of phylogenetic trees and associated evolutionary processes. Source code is available at <url>http://gomezlab.bme.unc.edu/tools</url>.</p

    Analysis of AML genes in dysregulated molecular networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying disease causing genes and understanding their molecular mechanisms are essential to developing effective therapeutics. Thus, several computational methods have been proposed to prioritize candidate disease genes by integrating different data types, including sequence information, biomedical literature, and pathway information. Recently, molecular interaction networks have been incorporated to predict disease genes, but most of those methods do not utilize invaluable disease-specific information available in mRNA expression profiles of patient samples.</p> <p>Results</p> <p>Through the integration of protein-protein interaction networks and gene expression profiles of acute myeloid leukemia (AML) patients, we identified subnetworks of interacting proteins dysregulated in AML and characterized known mutation genes causally implicated to AML embedded in the subnetworks. The analysis shows that the set of extracted subnetworks is a reservoir rich in AML genes reflecting key leukemogenic processes such as myeloid differentiation.</p> <p>Conclusion</p> <p>We showed that the integrative approach both utilizing gene expression profiles and molecular networks could identify AML causing genes most of which were not detectable with gene expression analysis alone due to the minor changes in mRNA level.</p

    A quantitative approach to study indirect effects among disease proteins in the human protein interaction network

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systems biology makes it possible to study larger and more intricate systems than before, so it is now possible to look at the molecular basis of several diseases in parallel. Analyzing the interaction network of proteins in the cell can be the key to understand how complex processes lead to diseases. Novel tools in network analysis provide the possibility to quantify the key interacting proteins in large networks as well as proteins that connect them. Here we suggest a new method to study the relationships between topology and functionality of the protein-protein interaction network, by identifying key mediator proteins possibly maintaining indirect relationships among proteins causing various diseases.</p> <p>Results</p> <p>Based on the i2d and OMIM databases, we have constructed (i) a network of proteins causing five selected diseases (DP, disease proteins) plus their interacting partners (IP, non-disease proteins), the DPIP network and (ii) a protein network showing only these IPs and their interactions, the IP network. The five investigated diseases were (1) various cancers, (2) heart diseases, (3) obesity, (4) diabetes and (5) autism. We have quantified the number and strength of IP-mediated indirect effects between the five groups of disease proteins and hypothetically identified the most important mediator proteins linking heart disease to obesity or diabetes in the IP network. The results present the relationship between mediator role and centrality, as well as between mediator role and functional properties of these proteins.</p> <p>Conclusions</p> <p>We show that a protein which plays an important indirect mediator role between two diseases is not necessarily a hub in the PPI network. This may suggest that, even if hub proteins and disease proteins are trivially of great interest, mediators may also deserve more attention, especially if disease-disease associations are to be understood. Identifying the hubs may not be sufficient to understand particular pathways. We have found that the mediators between heart diseases and obesity, as well as heart diseases and diabetes are of relatively high functional importance in the cell. The mediator proteins suggested here should be experimentally tested as products of hypothetical disease-related proteins.</p

    Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions

    Get PDF
    Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information.AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins.AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains

    Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods

    Get PDF
    Background: Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual aminoacids are systematically mutated to alanine and changes in free energy of binding (Delta Delta G) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition.Results: We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which Delta Delta G >= 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%.Conclusion: We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration

    Structural and Functional Roles of Coevolved Sites in Proteins

    Get PDF
    Understanding the residue covariations between multiple positions in protein families is very crucial and can be helpful for designing protein engineering experiments. These simultaneous changes or residue coevolution allow protein to maintain its overall structural-functional integrity while enabling it to acquire specific functional modifications. Despite the significant efforts in the field there is still controversy in terms of the preferable locations of coevolved residues on different regions of protein molecules, the strength of coevolutionary signal and role of coevolution in functional diversification.In this paper we study the scale and nature of residue coevolution in maintaining the overall functionality and structural integrity of proteins. We employed a large scale study to investigate the structural and functional aspects of coevolved residues. We found that the networks representing the coevolutionary residue connections within our dataset are in general of 'small-world' type as they have clustering coefficient values higher than random networks and also show smaller mean shortest path lengths similar and/or lower than random and regular networks. We also found that altogether 11% of functionally important sites are coevolved with any other sites. Active sites are found more frequently to coevolve with any other sites (15%) compared to protein (11%) and ligand (9%) binding sites. Metal binding and active sites are also found to be more frequently coevolved with other metal binding and active sites, respectively. Analysis of the coupling between coevolutionary processes and the spatial distribution of coevolved sites reveals that a high fraction of coevolved sites are located close to each other. Moreover, approximately 80% of charge compensatory substitutions within coevolved sites are found at very close spatial proximity (<or= 5A), pointing to the possible preservation of salt bridges in evolution.Our findings show that a noticeable fraction of functionally important sites undergo coevolution and also point towards compensatory substitutions as a probable coevolutionary mechanism within spatially proximal coevolved functional sites

    A Comprehensive Analysis of the Dynamic Biological Networks in HCV Induced Hepatocarcinogenesis

    Get PDF
    Hepatocellular carcinoma (HCC) is a primary malignancy of the liver, which is closely related to hepatitis C and cirrhosis. The molecular mechanisms underlying the hepatocarcinogenesis induced by HCV infection remain clarified from a standpoint of systems biology. By integrating data from protein-protein interactions, transcriptional regulation, and disease related microarray analysis, we carried out a dynamic biological network analysis on the progression of HCV induced hepatocarcinogenesis, and systematically explored the potentially disease-related mechanisms through a network view. The dysfunctional interactions among proteins and deregulatory relationships between transcription factors and their target genes could be causes for the occurrence and progression of this disease. The six pathologically defined disease stages in the development and progression of HCC after HCV infection were included in this study. We constructed disease-related biological networks for each disease stage, and identified progression-related sub-networks that potentially play roles in the developmental stage of the corresponding disease and participate in the later stage of cancer progression. In addition, we identified novel risk factors related to HCC based on the analysis of the progression-related sub-networks. The dynamic characteristics of the network reflect important features of the disease development and progression, which provide important information for us to further explore underlying mechanisms of the disease

    A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

    Get PDF
    Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments

    Breakpoint mapping of 13 large parkin deletions/duplications reveals an exon 4 deletion and an exon 7 duplication as founder mutations

    Get PDF
    Early-onset Parkinson’s disease (EOPD) has been associated with recessive mutations in parkin (PARK2). About half of the mutations found in parkin are genomic rearrangements, i.e., large deletions or duplications. Although many different rearrangements have been found in parkin before, the exact breakpoints involving these rearrangements are rarely mapped. In the present study, the exact breakpoints of 13 different parkin deletions/duplications, detected in 13 patients out of a total screened sample of 116 EOPD patients using Multiple Ligation Probe Amplification (MLPA) analysis, were mapped using real time quantitative polymerase chain reaction (PCR), long-range PCR and sequence analysis. Deletion/duplication-specific PCR tests were developed as a rapid and low cost tool to confirm MLPA results and to test family members or patients with similar parkin deletions/duplications. Besides several different deletions, an exon 3 deletion, an exon 4 deletion and an exon 7 duplication were found in multiple families. Haplotype analysis in four families showed that a common haplotype of 1.2 Mb could be distinguished for the exon 7 duplication and a common haplotype of 6.3 Mb for the deletion of exon 4. These findings suggest common founder effects for distinct large rearrangements in parkin
    corecore